{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "AutoML searches for the optimal model and its parameter combination through several trials. At the end of the experiment, in addition to returning the optimal model, **model ensemble** can also be performed on ``topk`` to improve the generalization of the pipeline.\n",
    "\n",
    "In HyperTS, we introduce a mechanism called GreedyEnsemble for model ensemble. Its specific process is as follows[1]:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Start with the empty ensemble.\n",
    "\n",
    "2. Add to the ensemble the model in the library that maximizes the ensemble’s performance to the error metric on a hillclimb (validation) set.\n",
    "\n",
    "3. Repeat Step 2 for a fixed number of iterations or until all the models have been used.\n",
    "\n",
    "4. Return the ensemble from the nested set of ensembles that has maximum performance on the hillclimb (validation) set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "References\n",
    "\n",
    "[1] Caruana, Rich, et al. \"Ensemble selection from libraries of models.\" in ICML. 2004."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Example of use:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Prepare Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.datasets import load_network_traffic\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_network_traffic(univariate=True)\n",
    "train_data, test_data = train_test_split(df, test_size=168, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Create Experiment and Run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts import make_experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set parameter ``ensemble_size`` to control the number of ensemble models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "experiment = make_experiment(train_data=train_data.copy(),\n",
    "                             task='forecast',\n",
    "                             mode='dl',\n",
    "                             timestamp='TimeStamp',\n",
    "                             covariates=['HourSin', 'WeekCos', 'CBWD'],\n",
    "                             forecast_train_data_periods=24*12,\n",
    "                             max_trials=10,\n",
    "                             ensemble_size=5)\n",
    "model = experiment.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method Pipeline.get_params of Pipeline(steps=[('data_preprocessing',\n",
       "                 TSFDataPreprocessStep(covariate_cleaner=CovariateTransformer(covariables=['HourSin',\n",
       "                                                                                           'WeekCos',\n",
       "                                                                                           'CBWD'],\n",
       "                                                                              data_cleaner_args={'correct_object_dtype': False,\n",
       "                                                                                                 'int_convert_to': 'str'}),\n",
       "                                       covariate_cleaner__covariables=['HourSin',\n",
       "                                                                       'WeekCos',\n",
       "                                                                       'CBWD'],\n",
       "                                       covariate_cleaner__data_cleaner_args={'correct_object_dtype': False,\n",
       "                                                                             'int_convert_to': 'str'},\n",
       "                                       covariate_cols=['HourSin', 'WeekCos',\n",
       "                                                       'CBWD'],\n",
       "                                       ensemble_size=5, freq='H',\n",
       "                                       name='data_preprocessing',\n",
       "                                       timestamp_col=['TimeStamp'],\n",
       "                                       train_data_periods=288)),\n",
       "                ('estimator',\n",
       "                 TSGreedyEnsemble(weight=[0.6, 0.0, 0.4, 0.0, 0.0], scores=[-2.410948636898898, -2.042171581044738, -1.9751808480925728, -1.9869096606536953, -1.991527303840764]))])>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.get_params"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Infer and Evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Metirc</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mae</td>\n",
       "      <td>2.5579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mse</td>\n",
       "      <td>15.8978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rmse</td>\n",
       "      <td>3.9872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mape</td>\n",
       "      <td>0.4066</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>smape</td>\n",
       "      <td>0.3483</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Metirc   Score\n",
       "0    mae  2.5579\n",
       "1    mse 15.8978\n",
       "2   rmse  3.9872\n",
       "3   mape  0.4066\n",
       "4  smape  0.3483"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test, y_test = model.split_X_y(test_data.copy())\n",
    "forecast = model.predict(X_test)\n",
    "results = model.evaluate(y_true=y_test, y_pred=forecast)\n",
    "results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
